Parallel Graph-Based Semi-Supervised Learning

نویسنده

  • Jeff Bilmes
چکیده

Semi-supervised learning (SSL) is the process of training decision functions using small amounts of labeled and relatively large amounts of unlabeled data. In many applications, annotating training data is time-consuming and error prone. Speech recognition is the typical example, which requires large amounts of meticulously annotated speech data (Evermann et al., 2005) to produce an accurate system. In the case of document classification for Internet search, it is not even feasible to accurately annotate a relatively large number of web pages for all categories of potential interest. SSL lends itself as a useful technique in many machine learning applications as one needs only to annotate relatively small amounts of the available data. SSL is related to the problem of transductive learning (Vapnik, 1998). In general, a learner is transductive if it is designed for prediction on only a closed data set, where the test set is revealed at training time. In practice, however, transductive learners can be modified to handle unseen data (Sindhwani et al., 2005; Zhu, 2005a). Chapter 25 in (Chapelle et al., 2007) gives a full discussion on the relationship between SSL and transductive learning. In this chapter, SSL refers to the semi-supervised transductive classification problem. Let x ∈ X denote the input to the decision function (classifier), f , and y ∈ Y denote its output label, i.e., f : X→ Y. In most cases f(x) = argmaxy∈Y p(y|x). In SSL, certain reasonable assumptions are made so that properties of the distribution p(x) (which is available from the unlabeled data sampled from p(x)) can influence p(y|x). These assumptions are as follows:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described in [13] for the construction of mini-batches for stochastic gradient descent (SGD) based on synthesized partitions of an affinity graph that are consistent wi...

متن کامل

Large-Scale Graph-based Transductive Inference

We consider the issue of scalability of graph-based semi-supervised learning (SSL) algorithms. In this context, we propose a fast graph node ordering algorithm that improves (parallel) spatial locality by being cache cognizant. This approach allows for a near linear speedup on a shared-memory parallel machine to be achievable, and thus means that graph-based SSL can scale to very large data set...

متن کامل

Parallel and Distributed Approaches for Graph Based Semi-supervised Learning

Two approaches for graph based semi-supervised learning are proposed. The first approach is based on iteration of an affine map. A key element of the affine map iteration is sparse matrix-vector multiplication, which has several very efficient parallel implementations. The second approach belongs to the class of Markov Chain Monte Carlo (MCMC) algorithms. It is based on sampling of nodes by per...

متن کامل

Cross-Domain Kernel Induction for Transfer Learning

The key question in transfer learning (TL) research is how to make model induction transferable across different domains. Common methods so far require source and target domains to have a shared/homogeneous feature space, or the projection of features from heterogeneous domains onto a shared space. This paper proposes a novel framework, which does not require a shared feature space but instead ...

متن کامل

Query-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning

Graph-based learning algorithms have been shown to be an effective approach for query-focused multi-document summarization (MDS). In this paper, we extend the standard graph ranking algorithm by proposing a two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised learning approach based on topic modeling techniques. Experimental results on TAC datasets show that by considerin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012